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Abstract. We study the inverse problem of deducing the dynamical charac- 
teristics (such as the potential field) of large systems from kinematic observa- 
tions. We show that, for a class of steady-state systems, the solution is unique 
even with fragmentary data, dark matter, or selection (bias) functions. Using 
spherically symmetric models for simulations, we investigate solution conver- 
gence and the roles of data noise and regularization in the inverse problem. 
We also present a method, analogous to tomography, for comparing the ob- 
served data with a model probability distribution such that the latter can be 
determined. 

1. Introduction 

Matter in the universe is usually contained in systems bound together by grav- 
itation: planets and their moons, planetary systems, star clusters, galaxies, and 
groups of galaxies. Indeed, the modern concepts of gravitation and gravitational 
potential were brought about by the realization that a mathematically well definable 
universal force field must keep celestial bodies in their observed orbits. Newton's 
solution to the inverse problem of "What kind of a force keeps two pointlike bodies 
in an elliptic orbit around each other?" was the inverse-square law of attraction^. 
In such a force field, the equations of motion for a body can be written as 

d 2 x 

where t G R is the time, i£l 3 describes the position of the body, and $ : R 3 — > R 
is the gravitational potential; here <E>(a;) ~ 

Outside isolated two-body configurations, the two-body solution only serves as a 
useful first approximation for systems dominated by one massive body; for such a 
class of systems, the perturbations from this solution can be examined analytically 
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1 Newton actually solved the direct problem of "What kind of an orbit is produced by the 
inverse-square attraction?" (this is what Halley asked him). The solution of the inverse problem 
is an instant corollary; it is also unique for motion around the focal point of an ellipse - the 
harmonic oscillator creates an elliptic orbit around the centre. 
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to a relatively high precision. The behaviour of a system consisting of a moder- 
ate number of bodies of more or less equal weight must be numerically integrated 
assuming the Newtonian potential for each pointlike mass. However, if the sys- 
tem consists of many (> 10 6 or so) gravitationally interacting bodies collectively 
forming the potential, we can assume it to obey some principles of statistical me- 
chanics. Accordingly, we no longer analyze the motion of separate particles, but 
want to define the dynamical characteristics of the system as a whole, such as its 
global-scale potential field ^(x), matter density p(x), etc. We define dynamical 
tomography as the inverse problem of deducing these smoothly distributed charac- 
teristics from kinematic observations. A more detailed definition of this large-scale 
generalization of Newton's dynamical inverse problem is given in Section 2.1. The 
term tomography can be used here in both intuitive and technical senses: we de- 
termine density/distribution functions inside a domain, and this can be done by 
employing the principles of tomographic analysis. 

Large gravitationally bound systems have been studied for more than a century, 
and the corresponding theoretical framework is described in several textbooks, of 
which [2] is the most up-to-date one. However, the studies have mostly concen- 
trated on the direct problem of creating numerically or analytically well tractable 
dynamical models, or of designing approximate system models to explain various 
observed phenomena, while the general inverse problem (as posed in sections 2 and 
3 here) has received less attention. This is mostly due to the lack of data sufficient 
for a comprehensive analysis, and the scarcity of efficient methods of dealing with 
such data. New sky surveys such as LSST (Large Synoptic Survey Telescope, USA) 
and Gaia (EU), both starting at the beginning of the next decade, will provide a 
vast amount of kinematic data of the stars in our galaxy, making the construction 
of a robust machinery for inverse problem analysis essential. We describe here some 
theoretical and practical aspects of solving problems in dynamical tomography. 

The structure of the paper is as follows: in Section 2, we review as well as 
define some basic concepts of large gravitational systems, and pose the problem in 
corresponding terminology. In Section 3, we examine the fundamental aspects and 
uniqueness properties of the inverse problem, and in Section 4 present examples 
in the simplified case of spherically symmetric systems. In Section 5, we redefine 
the problem in the sense of probability distributions, which is usually necessary in 
practice. Finally, in Section 6, we present conclusions and discuss future work and 
applications. 

2. Defining large systems 

2.1. Distribution functions. In this paper, we assume the system to be bound, 
stable, collisionless (d ^> D, where d is the shortest distance between two bodies 
of size D) and in a steady-state equilibrium (for a discussion on these standard 
assumptions and their applicability regimes, see [2]). In practice, this is a good 
approximation for large stellar systems, much in the same way as the two-body 
solution is a good starting point for analyzing the solar system. The dynamics 
of such a system is completely described by the smooth phase-space distribution 
function f(x, v, t) of its constituent particles (stars), / : R^ x R^ x R t — > E, where 
v £ R 3 is the velocity. The function / pertains to the limits d — > (infinitely many 
particles), D — > 0, and D/d — > 0. In practice, / is interpreted as the quantity giving 
the expected number of particles in a phase-space volume via f d 3 xd 3 v, or as the 
probability density of finding a particle at (x, v). 
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The distribution function / essentially corresponds to fluid in six dimensions; the 
equation of motion for / is the collisionless Boltzmann (or Vlasov) equation 

(1) ^ + <«,V !B /)-(V$,V /> = Q 

(a special case of Liouville's theorem; see [5]). In terms of Poisson brackets, this 
reads df /dt = [H, /] , where H is the Hamiltonian. This implies that the flow 
of the fluid (the smooth motion of particle phase points through phase space) is 
incompressible. In this paper, we study the steady-state case where matter units 
move in their orbits in the potential field $ while leaving the collective system 
configuration completely unchanged: df/dt = = d$/dt, i.e., / = f(x,v) and 
<I> = (fr(x). Moreover, the observational data, measurements of (x,v) for stars or 
equivalent units of matter, are effectively for one epoch: though they are measured 
over several years, the orbital periods are millions of years, so the data do not 
provide orbital information outside the linearized regime v — dx/dt. 

We also define a set Q including all the other characteristics than x, v that can 
be assigned to a star from observations, such as luminosity, temperature, chemical 
composition, etc. These may be direct observables or implicit quantities such as 
mass and age that can be expressed in terms of direct ones. They can be continuous 
or even discrete (taxonomy) , in which case an integral over such a variable is under- 
stood in a suitable corresponding sense. Thus we have a full distribution function 
F(x, v; Q) of N+6 dimensions, where N is the number of the quantities in the set Q. 
At first sight, one might imagine that only mass could be a dynamically interesting 
quantity in Q, but, e.g., luminosity directly affects the observability and bias fac- 
tors. Also, as will be shown below, the possibility to partition the observed particles 
into different populations with the aid of Q adds to the information content. 

We can distribute f(x, v) further among the members of Q at each (x, v) with a 
function <p(x, v; Q): 

Definition 2.1. The full distribution function of number density in x and 
in the set Q of N quantities q (generic notation for a member of Q) is 

(2) F(x, v; Q) = f(x, v)ip(x, v; Q), 

normalized such that Jq ip(x,v;Q) d N q = 1. The marginal distribution of some 
Q € Q, qo < q < qi is 

(3) n,,.>;,) = /(*,»M^) = /(*,»)/ *,.>;W»-V 

JQ\q 

and Iqo L P < y x ^ v '^) d( l = l - 

Definition 2.2. The distribution function fi(x,v) of an object population Pi is 
given by 

(4) ft(x,v) = f(x,v)Pi(x,v), 
where the population filter Vi{x,v) is 

(5) Vi(x,v)= / ip{x,v;Q)d N q, 

J A, 

where A,; sets the (integration) limits for each observable of Q according to the given 
characteristics of Pi. 
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An important distribution function is that of mass per d 3 xd 3 v, denoted by 
f(x,v): 

Definition 2.3. The mass distribution function is given by 

(6) f(x,v)=f(x,v)M(x,v), 
where the mass factor M(x, v) is 

(7) M(x,v)= Mip(x,v;M)dM. 

Jo 

For example, the case of all particles having the same mass Mq is given by 

<p(x,v; M) = S(M - M ), 

where 5 is the Dirac delta distribution. Thus M. — Mq. 

The amount of matter in an infinitesimal volume of phase space R 3 , x R 3 is 
fd 3 xd 3 v, and the matter density p(x) in R 3 is 

(8) p{x) = f f(x,v)d 3 v, 

where the velocity domain V+ includes all velocities v that can exist at x in a bound 
system; alternatively, for v $ V + we define f(x, v) = and can integrate over all v. 
The matter density and the potential are related through Poisson's equation: 

(9) p(x) = ^V 2 $(x), 

where G is the universal gravitation constant. Note that if / is the distribution of 
all matter, equations (P), © and © must be simultaneously fulfilled. The search 
for such self-consistent solution pairs of /, $ is the fundamental problem of stellar 
dynamics. However, as we will show in Section 3, / (or /) does not need to cover 
all matter in our problem; it can describe a selected population, or there may be 
unobservable matter such that the full / cannot be determined while <3? can. 

The distribution function / is a statistical tool, meant to describe the average 
distribution of matter in large enough bins. As mentioned above, another useful 
practical interpretation of / is the probability density of observing a star (or, for 
F(x,v;Q), a star with properties Q) at (x,v); this circumvents the local-scale dis- 
tribution of matter in space, and the size of our averaging bins - the collisionless 
approximation means (and observations show) that most of space is void of lumi- 
nous matter. In section 5, we will use this interpretation to redefine the observables 
of our inverse problem. 

We can now state our problem of dynamical tomography as follows: 

Problem. Given a large number of observed (x, v) (for any motion markers such 
as stars or other matter and possibly in different populations Pi) in a domain Q C 
R 3 xl 3 in a gravitationally bound steady-state system, deduce the potential &(x) of 
the system, and the distribution function(s) f(x,v) of the observed matter in Q. 

Obviously, if the number of observations is extremely large and all matter in 
the system is observed, the problem is trivial. The number of observations in 
arbitrarily small phase-space volumes is now nonzero, so an accurate estimate of 
the distribution f{x,v) is obtained by simply counting the objects, assuming that 
their masses are known. From this, <i>(x) follows; one could, as a shortcut to $(x), 
directly count objects in R 3 only. What makes the problem an inverse one is that 
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neither of the requisites is fulfilled in reality: the number of observations, though 
high, is not sufficient for direct counting in small enough phase-space (or even R^) 
bins, and, above all, not all of the system matter is observed because of bias factors 
or non-luminous matter. This is why we need to consider R^ x R^ instead of just 
RiJ even if we only want to obtain the fraction of unobserved matter can 

only be inferred by using a priori information furnished by the expected dynamical 
properties of the system. Our goal is now to find a parametrization in which the 
dynamics and a priori constraints of the system can be expressed in a practical 
and preferably analytically tractable way. Here we neglect effects such as spatial 
correlation between stars, changing mass (shedding or accretion) due to stellar 
evolution, etc. [2]. 

Some basic properties of the system can already be deduced from a very small 
number of observations. For example, if we observe matter at distance r from the 
centre of the system, moving at speed v r relative to us, and assume it to be bound 
by the system, we can roughly estimate the mass M inside r, e.g., by assuming a 
spherical mass distribution and a circular orbit around the centre: M = v^r/G. In 
this and similar ways it has been deduced from several observations of galaxies and 
their clusters that most of the mass in the universe is contained in dark matter: 
non-luminous and extremely hard if not impossible to observe directly. 

2.2. Jeans theorems and integrable systems. As is well known, a natural and 
compact way of parametrizing the desired dynamical properties of the system can 
be achieved via its dynamical invariants. A function I{x, v), evaluated at [x(t), v(t)] 
of any orbit in the potential $ of the system, is an integral of motion if 

j t I[x(t),v(t)}=0 

for all orbits at all t. Evaluating the time derivative by the chain rule and substi- 
tuting the equation of motion fl]), we obtain an equation exactly of the form (TT|) (/ 
corresponding to /) with the steady-state condition df/dt = 0. Thus we arrive at 
the Jeans theorem (see, e.g., 0) that states that any steady-state solution / of (fTJ) 
is of the form /[Jj(x, u)], where Ii are integrals of motion. For example, the energy 
E(x,v): 

(10) E(x,v) = ^{v,v)+$(x) 

is always an integral in gravitationally interacting systems. 

To have a complete set of integrals which we can, in principle, reconstruct and 
thus properly use in our formulation of the problem, we further restrict ourselves to 
regular orbits and integrable systems (defined below), a subset of the steady-state 
solutions of JT]). This class of solutions, though restricted, has two great advantages: 

i) it facilitates an analytical or semianalytical investigation of the problem as well 
as the derivation of uniqueness results; and 

ii) it can be readily viewed as a further modifiable approximate representation of 
the full class of solutions. 

An orbit is regular (quasiperiodic) if it is confined to a 3-torus S 1 x S 1 x S 1 in 
phase space [T] . Then it has three integrals (also called isolating integrals) 7j, i — 
1,2,3 that can be used to define the torus: Ii(x,v) = where C t are constants. 
The isolating integrals can be given in an arbitrary basis as functions of Ii are 
isolating integrals as well. A proper basis set of isolating integrals maps each torus 
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TasTnJ,,!= 1,2,3, i.e., any such bases can be mapped one-to-one into each 
other: {Ifi «-► {I;}. 

If a potential Q(x),x S R 3 creates an integrable system, all orbits are confined 
to 3-tori in R 3 x R 3 that are each defined by three action (Poincare) integrals 
Jj, i = 1,2,3, a class of isolating integrals /: 

(11) h = ^£ (p,dq), 

where p € R 3 and q GM 3 are any canonically conjugate momenta and coordinates, 
and *Pi is a path that cannot be continuously deformed into a point. For other paths, 
the integral vanishes. There are three such sets of possible paths each producing 
one Ji\ see, e.g., [U[2]. J G R 3 plays the role of canonical momentum in Hamilton's 
equations, and its canonically conjugate coordinate pair is the angle variable 9 £ R 3 . 
(J, 9) are related to phase-space coordinates (x, v) by a canonical transformation, 
and energy (Hamiltonian) depends on J only: E = E{J). 

Action-angle formalism yields the time averages theorem [2] that states that the 
density of the orbit on the torus, i.e., the average time it spends on different parts 
of the torus, is evenly distributed in 9 on its surface. This holds for non-resonant, 
i.e., not closed orbits that make up almost all of phase space. This, in turn, leads 
to the strong Jeans theorem that essentially states that, in an integrable system, a 
steady-state solution of |T]) must be of the form 

(12) f(x,v) = f[h(x,v),...,I 3 (x,v)} := f[I(x,v)\, 

where Ii are isolating integrals. 

If we want a steady-state solution for the luminous matter in the system, the 
distribution function of our choice is the mass distribution f(x,v), i.e., we now 
assume that / = f[I(x, v)]. Thus, if we want / and / to be interchangeable in our 
steady-state equations and theorems to be derived in section 3, we must assume 
the mass factor M. to be of the form M. — M.[I(x, v)] as well, i.e., <p{x,v;M) — 
ip[I(x, v); M]. Leaving the masses unknown is the advantage in using /, but in 
practice one should solve the inverse problem with / as well by weighting the count 
of each observed point (xi,Vi) by the corresponding mass Mj, and then compare 
the results (e.g., the tori obtained). 

The fact that the system as a whole is a steady-state and integrable one does not 
generally imply that a population filter Vi{x,v) is of the form Vi[I(x,v)], or that, 
for any other q than the mass M, <p(x, v; q) = ip[I(x, v); q]. But if such filters (i.e., 
steady-state populations) can be assumed to be identifiable at least in some domain 
of Q and R 3 x R 3 , they are very useful in the inverse problem, as will be shown in 
section 3. 

3. Inverse problem 

3.1. Uniqueness. By virtue of the strong Jeans theorem, we now seek integrable 
systems that reproduce the observations. Since the orbits of all particles, observed 
or not, are on tori, and f(x,v) is of the form (|12|) . we know that each isosurface 
f(x, v) = const in phase space R 3 x R 3 of any steady-state f(x, v), regardless of the 
group of objects it represents, entirely consists of 3-tori on which I(x, v) = const. 
To show that these isosurfaces determine the potential $(x) of the system, let us 
first present the following lemma: 
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Lemma 3.1. Let the potential <fr(x) generate an integrable system, and let T denote 
the corresponding set of 3-tori in R 3 xR 3 . Then &(x) is the only integrable potential 
(up to an additive constant) that creates any chosen subset T of arbitrarily small 
patches T on any tori ofT such that T covers all o/R 3 (accessible to the system) 
in a connected manner. 

Proof. The value E of the energy (p~0|) is constant on each torus. Let us choose a 
patch Tq on which <&(x) = Eq — ^ (v, v) , where Eq is the energy on the corresponding 
torus. Thus is defined up to an arbitrary constant for all x of the torus. If we 
have another patch T\ with a point x = xq common with Tq, and the corresponding 
phase-space points on the two patches are, respectively, (xq,vq) and (xq,v'), the 
value Ei of the energy on Ti must be 

Ex = *(aj ) + v') =E a + v') - (v , v Q )). 

This defines &(x) on Ti: $(a;) = E\ — ^(v,v). This chain of patches can be 
continued to cover $(x) everywhere (up to the arbitrary constant chosen for Eq at 
the beginning). □ 

Remark 1. The connected manner thus means that T on different tori can be 
arranged in a sequence by common values of x as above. Actually, any patch on a 
torus can even be replaced by a set of two points that do not have to be close to 
each other: to establish the connected chain for $(a;), we only require two points 
with different x on one torus, and one of the x shared with a point on another torus 
so that all x are covered. 

Remark 2. The lemma can also be expanded to concern all potentials (not just 
integrable systems) by defining T to be sections of orbits having common points x. 
In fact, just one chaotic orbit is sufficient as it eventually defines $(x) at all x £ R 3 . 
More generally, T can denote parts of any structures on which E is constant, or 
parts of isosurfaces of any functions of the form f(E). Note that an isosurface 
need not be connected, i.e., f(E) need not be monotonous. One value of / can 
correspond to more than one value of E as long as the branches of different E can 
be identified: T <-> E. 

The lemma states that even highly fragmentary information on the shape of 
the tori in phase space is well sufficient to determine the integrable potential <&(x) 
uniquely. Now, let independent distribution functions fi(x,v), i — 1,2,3, in an 
integrable potential <E>(x) be defined everywhere in R 3 xl 3 . By the strong Jeans 
theorem, the isosurfaces fi(x,v) — const are of the form f[I(x,v)] — const, where 
I{x,v) = const define the tori created by $>(x). The 3-surfaces formed by the 
intersection of three 5-surfaces fi{I) are 3-tori (defined by / as well). Then, by 
the lemma, any collection of parts of surfaces fi = const sufficient to determine a 
connected chain of torus patches uniquely determines Q(x). We can now state the 
following uniqueness theorem: 

Theorem 3.2. Let three independent steady-state distribution functions fi(x,v), 
z = 1,2,3 (f : K 3 X R — > M.) of matter in an integrable system be defined in some 
common domains o/R 3 x M 3 such that a set of parts of the surfaces fi(x, v) — const 
forms a succession of torus patches connected in x. Then the fi(x,v) uniquely 
determine the potential &(x). 
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Remark 3. We can pose this in the integral space R 3 as well: we assume that 
the fi(I), I £ I 3 , are such that the set of equations {fi(I) — Cj} has a nonzero 
and finite number of solutions / for sufficiently many {Ci} such that a connected 
sequence of patches can be constructed (both / and fi(I) are, of course, unknown 
prior to the determination of <fr(x)). Note that the number can be larger than one, 
i.e., fi(x,v) need not form a proper basis of isolating integrals: if a set {fi = C{\ 
corresponds to more than one torus (more than one set of {Ii = C^}), it is sufficient 
to be able to distinguish the different tori in the sense of the lemma (cf. remark [5] 
of the lemma) . 

Remark 4. The theorem emphasizes the role of different distribution functions 
fi(x,v) in providing information about the system. If different fi based on the ob- 
servations of various steady-state object populations are available, we gain more by 
studying /; rather than the total distribution function f tot = J2i fi °f au luminous 
mass. 

Remark 5. Most of all the feasible ways of filling R 3 x R 3 with invariant tori, 
each with a constant J of (fTTj) . are not derivable from any potential (cf. [7]). Thus 
there usually is no integrable potential (f> other than $ that, with its tori and 
some distribution function /', would yield f'[I<p{x,v)\ = f[I$(x,v)] everywhere in 
R 3 xM 3 , so in most cases one distribution function f(x,v) in R 3 x R 3 determines 



The theorem is constructive for three /'s from which we directly get (fr(x) with 
the procedure of the lemma, while the case of one / (remark[5]) is non-constructive. 
Finding the integrable &(x) corresponding to one / is not obvious since there are 
no general procedures for finding and exploring integrable potentials [10] . The set 
of potentials giving rise to integrable systems is known to be a vanishingly small 
subset of all potentials [10], and there is no such e =^ that, for an integrable 4>(x), 
4>{x) +etp(x) is still integrable for an arbitrary <p(x). In practice, we can circumvent 
this difficulty by allowing the use of non-integrable potentials and approximate tori, 
i.e., we construct a set of approximate integrals I(x, v) for a potential $(x) [6]; this 
approach will be discussed in section 6. 

The importance of isosurfaces in an integrable system is that they relieve us 
from having to record the time evolution of orbits. If the orbital tori are definable, 
we only need to probe phase space at one moment. It is important to note that 
f(x, v) does not need to cover all matter in the system (then the single-/ uniqueness 
would be trivial for all systems): it can represent any fraction of the matter, even 
virtually massless test particles, as long as it is defined on all tori and it has settled 
to a steady-state equilibrium. This underlines the fact that we are interested more 
in the geometric structures in R 3 x R 3 (in practice, in a sufficiently large domain 
of observations) rather than the actual distribution function / itself. 

The fact that we do not need to observe all of the matter in the system to 
determine $(x) is of crucial importance. Because of the abundance of dark matter, 
a large part of all matter will not be seen even in ideal conditions. If, for example, 
/ represents all luminous matter, we immediately obtain the mass density pd(x) of 
dark matter by subtracting ([8|) from (J9j) : 



(13) 
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Another cause of unobservability is the inevitable fact that our instruments are 
not sensitive enough, or that, e.g., interstellar dust prevents us from seeing some 
regions of space properly. Such effects can be described by the bias or selection 
function j(x, v), < 7 < 1, that represents the fraction of observable matter of 
f(x,v) in d 3 xd 3 v. The potential <j>(a;) can be determined uniquely even in the 
presence of 7(22, v) as we immediately can see if we suppose that we have independent 
fi(x,v) > available that share the same j(x,v), since now the ratios of /, can be 
viewed as a new basis of isolating integrals: 

Theorem 3.3. Let the products gi{x,v) — 7(2:, v) f%(x, v) > 0, i = 1, . . . ,4, of a 
bias function < 7 < 1 and four independent steady-state distribution functions fi 
be defined in R 3 x R 3 . Then the three ratios 

9i{x,v) _ fj(x,v) 
9i(x,v) fi[x,v) 
uniquely determine the potential as in theorem VS . 6 A 

Remark 6. As above, this emphasizes the combined information content of dif- 
ferent distribution functions. Analogously with remark [SJ we can stipulate for two 
fi that in most cases two products gi(x,v) = ~/(x,v) fi(x,v) > 0, i = 1,2, de- 
fined everywhere in R 3 x R 3 , determine <j>(x) via their ratio g^ix^v)/ gi{x,v) — 

Remark 7. The bias function 7(2;) may depend on x only; 7 : R 3 — > R. In 
such cases the additional factor 7(2;) does not really increase the possibility of the 
existence of another integrable potential <j){x) and bias function £(x), < £ < 
1, such that £/' = 7/ everywhere. This would require there to be an isolating 
integral of 4>{x) that everywhere matches an isolating integral of another potential 
((fr(x)) when multiplied with a function of x only (£(x)/"f(x)), while general isolating 
integrals are mixed functions of x, v, and individual potentials. No examples of such 
transformations are known. Thus, usually the single product r y{x)f{x, v) determines 
the potential $(x) of the system. 

The above results mean that dark matter, selection bias, or fragmentary data are 
no fundamental obstacles to dynamical tomography. It should be noted that, for 
near-integrable systems in the sense of Kolmogorov-Arnold-Moser (KAM) theorem 
PQ, most of our results for integrable potentials still hold to a high precision. This 
is because they are based on the toroidal topology of motion in phase space, and 
almost all motion in near-integrable systems still occurs on tori that can be viewed 
as perturbed versions of the invariant tori of an integrable system. Even though the 
geometric structure of orbits in phase space changes radically as we move away from 
integrability, allowing chaotic orbits and substructures of foliated tori around closed 
orbits, the transition is smooth dynamically, i.e., motion is still mostly quasitoroidal 
within a given resolution |10j . 

The solution of the inverse problem is based on the large number N of observed 
(x,v), from which the distribution function f(x,v) can be estimated. With smaller 
effective number of phase-space dimensions than six, i.e., in the case of potential 
symmetries, N may be sufficient for actual binning in sufficiently small volumes 
AxAv in the reduced phase space. Thus we sample f(x,v) directly by counting 
and can take it to be our observable, as in section 4. For six dimensions, we can, 
e.g., take random samples of marginal distributions to be our observables, as will 
be discussed in section 5. 
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3.2. Self-consistency regularization. If we assume / to represent all matter 
in the system, the self-consistency requirement of / and «3> can be used as regular- 
ization. For example, we can minimize the discrepancy between ([5]) and © at each 
space volume with the function 

A J \p D (x)\d 3 x, 

where A is a suitable weight factor, and pd(x) is given by (fT3|) . In practice, this 
can be replaced by 

(14) \J2[PD(x t )} 2 , 

i 

for a large number of test locations Xi, written in a manifest x 2 -form. As will 
be discussed below, it is sometimes better to compare potential values rather than 
densities at test points due to numerical instabilities in evaluating Poisson's equation 
directly. 

If any amount of dark matter is allowed, no regularization such as (|14[) can 
be used if there is no information on the masses of the stars, i.e., observations 
correspond to the number density, and the estimated mass would only be based on 
the regularization. This is easy to see by considering the case of constant density 
fraction of luminous matter: p\ U m{x) I (pd + Plum) = const < 1 everywhere in M. 3 . 
Regularization of the form (fl4|) would then always yield exactly po — 0, regardless 
of the true density ratio. If masses are included in observations, i.e., we would solve 
for / even without regularization, ()14|1 can be employed, though minimizing the 
amount of dark matter is not necessarily an appropriate principle then. 

4. Examples in spherical symmetry 

Spherically symmetric systems have the advantage of being integrable, as now the 
energy E(x, v) as well as the three components of the angular momentum P(x, v) = 
x Av are isolating integrals; the fact that there are four integrals instead of three is 
reflected in that each orbit is constrained to a plane (e.g., The original Jeans' 
theorem applies only to systems without such degeneracies, but it can be extended 
to the spherical case [11], stating that any steady-state distribution function in a 
potential $(r), where r = \\x\\ is the radius, must be of the form f(E, P). Further, 
since the system is spherically symmetric, the direction of P is uniformly distributed 
over the unit sphere S 2 , so we are only interested in functions of the form f(E, L), 
where 

(15) L(x,v) = \\x Av\\. 

It is customary (e.g., [2]) to use the concepts of relative potential and energy \P 
and £ such that 

(16) V(x) :=-$(x)+$ , £{x,v) := V(x) - -(«,«) = -E(x,v) + $ , 

where $ is chosen such that / > for £ > and / = for £ < (usually, $ = 0) . 
We now study potentials and distribution functions of the form f(£,L). In 

this section, we simplify the notation by using f(£,L) directly to denote the mass 
distribution function (i.e., /) as it is the form the analytical treatment here always 
refers to. For simplicity, let us also assume all the stars to have the same mass m. 
The systems as well as inversion methods used in this section are simple ones, as 
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the goal is to investigate the basic properties of the inverse problem rather than 
develop a full-fledged analysis machinery. 

Our phase space is now essentially reduced to radius r and velocity v in a plane. 
Because of this reduction in variables, spherically symmetric cases offer analyti- 
cal tractability to a much larger extent than other systems. Of course spherical 
symmetry rarely occurs in actual gravitational systems, but it is highly useful for 
exploring the basic properties of the inverse problem (uniqueness, stability, the ef- 
fect of sampling density, convergence of solution, etc.) in practical computation. 
Indeed, because of their analytical prospects, spherical systems and distribution 
functions of the form f(£, L) have enjoyed ever-continuing popularity in theoretical 
dynamics for over a century. 

Since now we know the exact functional forms of the isolating integrals, we 
can immediately see that the 7(x)-uniqueness of remark [7] holds exactly for any 
nontrivial f(£ , L) and , J , (r). All distribution functions f{x, v) now have to be of the 
form f[£(x, v), L(x, v)], and obviously no function f[£(x, v), L(x, v)] is separable in 
the form h(x)g[£(x, v), L(x, v)]. Thus there is no bias function 7^ 7(2:) allowing 
a product ^(x)fL(x,v) to be identical with 'y(x)f<s ! (x, v) everywhere (note that the 
bias function 7 does not have to be spherically symmetric). Thus we can state the 
following uniqueness theorem: 

Theorem 4.1. If a bias function 7(2;) depends on x only: 7 : R 3 — > R, the product 
r y(x)f(x,v), defined everywhere in R 3 x R 3 in a spherically symmetric steady-state 
system, uniquely determines j(x). 

Remark 8. Similarly, the restricted form f[£(x, v), L(x, v)] corroborates remark[5] 
algebraically. 

Remark 9. Stackel potentials T(x) [2jlH[5] are a class of integrable potentials for 
which the Hamilton- Jacobi equation is separable in ellipsoidal coordinates. Just 
like in the spherical case, the functional forms of their isolating integrals in the dis- 
tribution function f[E(x, v), Iiix, v), I^{x, v)] are known exactly |4j. Examining the 
forms of the isolating Stackel integrals, we can immediately see that corresponding 
uniqueness results hold for all steady-state systems governed by Stackel potentials 



The simplest nontrivial distribution functions are obtained by just removing in- 
dependence and using isotropic f(£). The solution for a spherically symmetric 
system whose f{£) is self-consistent with its density p(r) and the corresponding- 
potential ^(r) is now simple to obtain. By writing ([8]) in spherical symmetry and 
changing the integration variable v — > £, we obtain an Abel integral equation for 
/(<?), the solution of which is the well-known Eddington's formula :2 



where \1/ is used as the argument of p via ^(r) — > r(^)\ see [5] for the conditions 
for f(£) to be non-negative, etc. Note that (fT7[) can naturally be used for non- 
sclf-consistent distribution functions (systems containing dark matter) as well. For 
example, we can solve for the distribution function /i um (£) of luminous matter 
from the luminous fraction of the total density: p\ um {r) = g{r)p(r), with a given 
< g(r) < 1, thus replacing p — + gp in (JT7J) while retaining the ^(r) corresponding 
to p(r). 

Inverse Problems and Imaging Volume X, No. X (200X), X-XX 



T(x). 



(17) 




12 



Mikko Kaasalainen 



With (fT7|) . we can simulate observations of the distribution function of a spher- 
ically symmetric system, and then numerically check how well its potential can be 
recovered. If self-consistency is used for regularization, Poisson's equation (J5|) is 
simple to integrate, so the regularizing function corresponding to (|14p . but in terms 
of potential rather than density, can be given as 



(18) 



( ¥(r«) - 16tt 2 G[- / / f[£mr'), u)]u 2 dur l2 dr' 

■ K r i JO JO 



/[£($(/), u)]u 2 dur' 2 dr' 



where u := ||v||. When using gradient-based methods in optimization, integrals of 
the form (|18p must be differentiated algorithmically for consistency, i.e., by taking 
the derivatives of the numerical algorithm for evaluating (|18[) rather than first taking 
the analytical derivative of (|18|) and then evaluating the integral numerically. 

The lemma (remark^]) and theorem 14.11 ensure that there is a unique solution for 
both the potential 'J'(r) and the bias function j(x). The % 2 -function of our inverse 
problem can now be represented as 



(19) x 2 = ]T [Nam - jf ~f(x)f[£(y(r),u)]d 3 xd 3 v 



2 + A X 2 



rcg ■ 



where Ni is the number of stars observed in the region f2j, and m., = m. To ensure 
positivity, let us model our potential and distribution function as 

(20) *(r)=exp(5>r*), f(£) = exp 

z i 

We could include an explicit 1/f-term in the exponential sum for /(£) to make 
sure that / — > as £ — > 0, but this turns out to be neither necessary nor useful 
in practice. Because of the adopted form of ^(r), it is especially useful to employ 
(|18p rather than density regularization as Poisson's equation would introduce a 
singularity for one term. 

Simple generalizations of the r _1 -type Kepler potential of a point mass already 
yield interesting examples of systems of continuous mass distribution. In fact, dy- 
namical systems whose potentials decrease much faster than this exhibit instability 
problems [2]. This is another reason why a number of variations of the r _1 -theme 
have been developed. For example, the isochrone potential 

. . GM 

(21) *' c(r) " YTTWTTf 

where M is the mass of the system and b > a scaling constant, with its density 
pair pic(r) given by Poisson's equation, yields the distribution function [5] 



fw(S) 



M \[I 



(22) 



[Z2GMbf/ 2 (l - £) 
64£ 4 + 3(16£ 2 + 28£-9) 



27 - 66£ + 320£ 2 - 240£ 3 



4 o/iec2 ioc n\ arcsm 



I £{!-£)- 

where £ — £b/(GM). We will use this in our numerical examples below. 
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Figure 1. (a) \&(r) and (b) /(£) of the isochrone potential (solid 
lines), together with the best-fit models from observed binned 
f(x,v) (dashed lines). 



We simulated observed Ni by creating 400 randomly distributed fij in (r, u)- 
space (the extents of each Qi about 1% of the maximum widths in r and u, with a 
suitable cutoff value r max 3> b) and computing the amount of mass in them from 
fjc- Any distribution function corresponding to an r -1 -type potential drops fast 
as r increases; thus those Qi close to r = contain most of the mass and dominate 
X 2 if the volumes of fli are equal. A more sophisticated observation sampling could 
use increased volumes away from r = 0, but in any case this sampling is artificial 
(as is the symmetry setup) and designed rather to probe the numerical properties 
of the problem than to simulate realistic conditions. 

The objective function (|f 9[) was minimized with the Levenberg-Marquardt grad- 
ient-based procedure [13]. The initial guess for \I>(r) was based on the top velocities 
at each r, and the initial /(£) simply had a correspondingly scaled zeroth-order 
term and a first-order term anticipating / — * as £ — > 0. Even with this crude 
initial values, the procedure converged well towards the correct minimum, i.e., % 2 
does not have many local minima far away from each other in the parameter space 
at least with reasonably low truncation degrees of (]20|) . The minimum is not par- 
ticularly sharp, i.e., there are many virtually as good solutions (x 2 -level varying a 
few percent) close to each other even in the noiseless expected because of 

the insufficiency of the model ("model noise"), in particular near the centre r = 0. 

The solution was stable to typical error levels of data: adding 5% or even 10% 
noise to the data values resulted in virtually the same ^f(r) and /(£). An expo- 
nential bias function j(x) = cxp(— \\x — xq\\/Rq) with the centre at different values 
of xo was well solved for by the procedure, together with \&(r) and /(£). Fig. 1 a 
and b show the solution ^f(r) and f(£) (dashed lines), obtained with coefficients 
up to degrees 4 and 5, respectively, against the actual VP/c and f/c (solid lines) 
with GM = 1 and b — 0.5 (r max = 10), a noise level of 5%, and a bias j{x) with 
Ro = 5 and ||a;o|| = 2. With the bias, the sample cells in were defined by r and 
the polar angle 6 (xo was placed at r = 2 and 6> = 0). 

Including the regularization term (Tl8| for a self-consistent case improved the \&(r) 
fit at large r; as a counterexample, Fig. 1 a shows the case without regularization, 
corresponding to allowing dark matter; with strong regularization, the lines essen- 
tially coincide at all r < r max . The slight "overshooting" of \t(r)- and /(£)-models 
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Figure 2. (a) A chosen f(£,L) for the isochrone, and (b) the 
best-fit model. The plots are in (£/£ m , L/L m (£)-space. 



at the centre of the system (at r — and £ ~ \&(0)) is partly an intrinsic limitation 
of the polynomial model. 

The form f(£.L) for the distribution function describes the anisotropy of ve- 
locities; there are now infinitely many f{£ , L) corresponding to a given density 
p(r). Creating general functions of this form even for spherical symmetries is not 
as straightforward as Eddington's formula for /(£). The Osipkov-Merritt proce- 
dure [2] is a standard technique for creating a family of self-consistent distribution 
functions with an adjustable anisotropy parameter, but it is heavily restricted as it 
still essentially uses Eddington's formula by fixing a combination IA = £ — cL 2 and 
stating / = f(U). 

In our case we can (and must) allow the existence of dark matter, so we can carry 
out the simulations without the self-consistency requirement. Thus we can simply 
use, e.g., 

(23) f(£,L) = f (£)h(£,L), 

where fa{£) is any suitable isotropic function, and the function < h(£, L) < 1 can 
be chosen to represent any desired dynamical characteristics of the observed matter 
(anisotropy of angular momentum distribution in various regions, etc.) via some 
adjustable parameters. The density p(r) from this is always p(r) < po(r), and the 
difference po — p is attributed to dark matter; whether or not this interpretation is 
realistic is not relevant to this study. 

Suppose we want to describe a system with central orbits mostly radial and those 
reaching high radii mostly tangential. Then we can choose, for example, 

(24) ^^ihi'-^Him + £)]}< 

where < d < 1 is an amplitude factor, £ m is the maximal £, and the maximal 
angular momentum L m = ru(r,£) occurs for given £ at f for which dL m (r) / dr = 0, 
i.e., L m (£) is obtained from 

dr 

where ^a(r) corresponds to po{r). 



(25) 2£ = f — j-^- + 2^ (f) => f, L m (E)=ry/2(V (r)-£), 

dr 
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Using the isochrone for fo(£) = fic{£) and setting G = 1, we have 

(») H<e>»-"»(^ + ^-»). 

Fig. 2a shows the (suitably normalized) f(£,L) from fic(£) (as with the example 
of Fig. 1) and (f24|) with an amplitude d = 1/2, plotted in the (£/£ m ,L/L m (£))- 
space for convenience. The plotted surface corresponds to [0.05, 0.95] x [0.05, 0.95] 
so that the behaviour of f(£ , L) can well be seen without the shrinking L-ranges 
due to (f2l))) . The inverse solution, based on 4000 bins distributed in (r, w)-space in 
the same manner as in the example of Fig. 1, and with the same computational 
procedure for solving the inverse problem, is shown in Fig. 2b. The (r, u)-space is 
here given by the dimensions (r, v r , tu), where v r = dr/dt is the radial velocity and 
the tangential velocity is v<j, — \/u 2 — v% , i.e., L = rv^. 

With the dark-matter scenario, we have no self-consistency regularization (and 
m = 1 in the /(£, L)-version of (|19p ). The model for f{£ , L) was taken to be 

(27) f(£,L)=e X p(J2bi j £ i v), 

ij 

with the largest degree of 5 for both £ l and L 3 . The two-dimensional polynomial 
series may not be the best general representation for /(£, L) (cf. the intrinsic limi- 
tations mentioned with Fig. 1), but in this case it at least appears to lead essentially 
to as good a convergence and solution as in the one-dimensional polynomial case. 

5. Distribution function as probability distribution 

To be able to solve the inverse problem in terms of probability distributions and 
without binned sampling, we need to define how to compare a model distribution 
and observations, i.e., a sampled realization of some probability distribution. Direct 
comparison methods of distributions are based on cumulative distributions as the 
latter are well-defined non-binned observables. However, cumulative distributions 
are uniquely defined only in one dimension though there have been some attempts at 
two and three dimensions; see |13j and references therein. This dilemma is solved 
by noting that the one-dimensional marginal distributions of our f(x,v) exactly 
correspond to the usual line projections in the projection-slice theorem, Radon 
transform and related tomographic analysis; see, e.g., [3, 9] and references therein. 

In general terms, let f(x), x S E , be a probability distribution, and let R, an 
N x N orthogonal rotation matrix, det R — 1, describe a linear transformation to 
a new basis in R N : w — Rx. The marginal distribution function h(z) along z, any 
one of the new coordinate axes denoted by the set W(R), is 

(28) h(z) = f f{R- 1 w)d N - 1 w, 

JW(R)\z 

and its cumulative distribution function C[z) is 

(29) C{z) = f h{z')dz', 

usually with z m - m — > — oo. Since h(z) = dC(z)/dz uniquely defines h(z) from a 
given C(z), we know from tomographic theory (hence this inverse problem can be 
called dynamical tomography) that: 

The probability distribution f{x), x € R N , is uniquely determined by the cumu- 
lative distributions C{z) of its marginal distributions h(z) along all line directions 
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in M. N that define the coordinate axis directions of z. For example, in M. 2 the lines 
are denned by all possible values of the rotation angle 6 S S 1 , while in R 3 they are 
given by all the directions ui E S 2 . 

Two one-dimensional distributions can be compared via their cumulative distri- 
bution functions. In the case of observations and a model, we denote the obser- 
vational distribution of K observations at Zi (arranged in ascending order by z) 

by 

(30) S K (z) = i, Zi < z < Zi+i, 

for number density, or, if mass is included in our problem, 



(31) Sk(z) — rrij, Zi < z < z 



=i 



with Sk{z) = for z < z\. Now we are using cumulative distributions Sk as our 
observables. The comparison between Sk{z) and a model C(z) can be done with a 
number of norms; with the usual L-2 we have ^-comparison. Another choice often 
used is the Loo-norm giving simply 

©= max \S K (z) - C(z)\, 

— oo<s<oo 

with which one can use the Kolmogorov-Smirnov (K-S) probability < P < 1 of 
matching marginal distributions |14U13j . when both Sk{z) and C{z) are normalized 
to unity by 

s K (z)^44^, c( Z ) > c{z) 



S k {zkY G[z max y 
usually with z max — > 00, and one normalizes the obtained / afterwards such that 
/ f d x = K. This, however, distorts the comparison as the high-z ends are now 
always matched at unity at the comparison stage. Also, K-S probability P is defined 
by a series with each term a power of 

d KS = exp(-ivS) 2 ), 

which means that, for a large number of observations K, the model C(z) must be 
very close to the observed Sk{z) {KD 2 < 10) for the formal probability P(dKs) 
to have any computationally useful level above zero. Even if the observations were 
perfect, the model cannot reach such a fit, so usually P(dKs) — ¥ regardless of the 
model. 

It is thus practicable to use the x 2 from the comparison pairs at each Zf, if K 
is very large, the comparison can be done for a smaller set of points in each z via 
pruning or interpolation as the curves of Sk(z) are now so smooth that this loses 
virtually no information. Thus we can measure the x 2 -sum of marginal distribution 
comparisons at each chosen coordinate line direction i and its points zf. 

(32) X 2 =Y J [S${z ] )~C^{z ] )] 2 . 

In principle, we could use an arbitrarily large number of projection lines z"> as, 
in contrast to standard tomography, there are no limiting factors for choosing them 
(except for computation time). However, we can expect that even a very limited 
number of projection lines is compensated for by the strong a priori information 
(or assumptions) on f(x,v). As is known from, e.g., limited-angle tomography, 
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incorporating even simple a priori information is a powerful means of enhancing the 
solution of the inverse problem [3] . 

We can, for example, expect that simple rotations in each of the 15 two-dimensional 
coordinate planes in R^ x R^ are sufficient for forming a set of projection lines. Now 
two rows of the 6x6 coordinate transformation matrix R, defining two choices for 
z, correspond to the rotation through some angle 9, while for other rows R is an 
identity matrix. We evaluate each hi{z) corresponding to the choice of z ordered 
by i using w — Ri(x,v) T : 



and find, via (z) and C^(z), the best model parameters of f[I$(x, v)], 

and 7(2;) minimizing l|32[) . 

We can use the x 2 -formalism to examine different parts of R^ x R^ with different 
weights, using [z m i n ,z max ] other than ] — oo,oo[. For example, we can zoom in on 
parts far away from the centre and fit their cumulative distributions in separate 
X 2 -terms - otherwise their effect on x 2 would be negligible compare to centre parts. 
Also, with marginal distributions we can even use data lacking some coordinates 
or having poor accuracy in them. For example, if some observations have only 
recorded x without v, we can still use them in marginal distributions in x. 

While a one-dimensional probability distribution is easy to sample using the 
inverse function of its cumulative distribution, creating a corresponding multidi- 
mensional coordinate transform x <-> y , x,y € R w , such that uniform random 
sampling in y would automatically create a correct distribution in x is difficult and 
not always realizable. Thus we have to resort to a selective Monte Carlo sampling 
algorithm in simulations: 

Sprinkler algorithm for Monte Carlo sampling an TV-dimensional dis- 
tribution f(x), f : R N — > R. 

1. Draw a random point x G ft (uniform distribution in M. N ), where ft is the 
desired region ft C M. N . 

2. Evaluate the probability 



where u>(x) C ft is a finite but small integration region around x, similar (with 
the same volume) for each drawn x. 
3. Draw a random number < q < 1 (uniform distribution for g 6 R). The 
point x is included in the sample set S if q < p(x)/p m , where p m — maxp(a;), 



4. Return to 1 until S is large enough to approximate f(x). 

As N increases, the ratio between included and drawn x decreases for peaked 
distributions, so for N > 3 this method gets slower than for low dimensions; in such 
cases faster Monte Carlo sampling can be carried out with Markov chain based 
MCMC tehniques [9]. 

As an example, we created 10 5 random samples of the fic(£) of the previous 
examples in the (r, u)-plane in the above manner. The distribution function /(r, u) 
in the (r, u)-plane is given by 





X G ft. 



(34) 



f(r,u) = 167rV u 2 f[£(r,u)], 
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Figure 3. Marginal and cumulative distributions in (a) r and (b) 
u from the isochrone distribution [h in solid line, Sk from samples 
in dashed line), and the distributions with the best-fit model [h in 
dotted line, C in dot-dash). 

and its marginal distributions in r and u are shown as solid lines in Fig. 3 a and 
b, suitably normalized to scale between and 1. The cumulative distributions Sk 
in r and u from the random samples are shown as dashed lines (again suitably 
normalized). The best-fit model solution C to (j32j) . with $(r) and /(£) of (|20| 
as earlier, matches these well (dot-dash); its marginal distributions are shown as 
dotted lines. The solution is virtually the same as in Fig. 1, so the tomography 
of the probability distribution is a practical approach. In particular, just the two 
distributions Sk{t) and Sk(u) were already sufficient for obtaining the solution: 
due to the highly restricted form of f(x,v), no more projection lines were needed. 

6. Conclusions and discussion 

We have defined the inverse problem of dynamical tomography and shown that, 
with suitable assumptions and mathematical tools, the problem is well-posed and 
solvable. The uniqueness theorems are central to dynamical tomography: they 
demonstrate that the steady-state assumption allows a unique solution even with 
fragmentary data, unobservable mass, and observational bias functions. 

Another important concept is the possibility to approximate general (or at least 
near-integrable) systems with integrable ones. Near-integrable systems that are not 
very old, but old enough to have settled to a quasistable state, appear closer to 
integrable than old ones as phase-space diffusion (Arnold diffusion) is not extensive 
yet. As Nekhoroshev's theorem as well as semianalytical approximations and nu- 
merical estimates show [10] . the time scale for such diffusion grows fast in inverse 
proportion to the distance of the initial phase-space point from a KAM torus. For 
example, in [8] it was shown that even in chaotic zones it is possible to construct 
approximate tori such that orbits of the system resemble perturbed motion on these 
tori for small enough time intervals. 

Torus construction [6] is a well-defined concept for near-integrable potentials [12] : 
if a Hamiltonian system is near-integrable in the sense of the KAM theorem, it is 
integrable on a Cantor set consisting of the surviving KAM tori of the system, 
i.e., it is possible to construct tori such that their defining action integrals indeed 
parametrize a global, ordered set. Via torus construction, we find a potential $(x) 
for which we can create an approximate set of tori (and determine expressions for 
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their action integrals J and thus for distribution functions /[J($(x); x, v)\) defining 
an integrable system such that the steady-state integrability assumption used here 
agrees with the observations as well as possible. Note that itself does not 

have to be integrable or even near-integrable; the torus set constructed defines an 
integrable Hamiltonian that is not usually derivable from a potential, so we never 
get an approximate integrable potential in the first place. This gives the derived 
some additional flexibility; indeed, any integrable potential approximating a 
real galaxy is probably not a very good representation. The key principle allowing 
the flexibility is the same as in the uniqueness theorems: we look for isosurfaces in 
phase space best explaining the observations. The goodness of our solution is 
not directly measured by its closeness to an integrable system. 

Our $(x) should thus be able to mimic the real potential quite well in cases of 
near-integrable or even somewhat chaotic but not very old systems. The feasible 
potentials $(a;) that can reproduce, via the torus-construction principle, the approx- 
imate isosurface structure of the observed matter distribution can all be expected 
to be close to each other. In fact, even if were integrable, we would have to 
use torus construction to find it even if we started with an integrable initial $o(^)i 
since the iteration procedure (or equivalent) for fitting a model to observations will 
generally explore non- integrable potentials <i>(x). 

Let us denote by Hq the integrable Hamiltonian corresponding to the tori con- 
structed for $(x), and by H the Hamiltonian of <5>(x). For optimal Ho, the difference 
(in some chosen norm) 

\\sh\\ = \\h-h \\ 

is as small as possible over the whole phase space; we call the corresponding tori 
the optimal tori of <&, and Hq the optimal Hamiltonian of $. If $ is integrable, then 
its optimal tori are its invariant tori: SH vanishes everywhere. For our purposes, 
\\SH\\ does not have to be small (although the smaller it is, the better). 
We surmise that: 

i) The optimal tori constructed for $(x) form a map $ — ► Hq. With the same 
construction scheme, a potential $(x) + e(j>(x) yields an optimal Hamiltonian 
Hq + eh . 

ii) The isosurfaces of the distribution function of the system can be approximated 
by constructing them from a set of 3-tori describable by an integrable Hamil- 
tonian Hq (but not necesssarily by an integrable potential). 

Then our $ can be expected to be a good approximation of the real potential 
of the system (as a regularizing constraint, we can choose, e.g., the smallness of 
\\8H\\). 

The possibility of constructing optimal tori and Hamiltonians holds another great 
advantage: a multitude of deviations from integrability can readily be modelled with 
a suitably tailored version of Hamiltonian perturbation theory [7]. This includes 
both structural detail (resonant orbit families and other details) and timelike irreg- 
ularities (deviations from steady state). 

In a forthcoming study, we will study more general and realistic systems such as 
axisymmetric and fully three-dimensional Stackel potentials. We will also introduce 
more general observational biases and other factors, and investigate their influence. 
The final goal is the combination of a general torus-construction machinery and 
an analysis procedure for dynamical tomography, so that we can work with any 
potentials. With such tools, we can analyze the data from large-scale surveys and 
construct a consistent mathematical model of the dynamics of our galaxy. The real 
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galactic problem can be expected to suffer from considerable model noise, so we 
should employ various forms of modelling the distribution functions and potential. 
In addition to mapping the density distribution of the dark matter in our galaxy, we 
should also be able to test whether distributions with alternative theories of gravity 
are distinguishable from standard models with dark matter. 
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